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Abstract. Indirect inference (II) is a methodology for estimating the 
parameters of an intractable (generative) model on the basis of an al¬ 
ternative parametric (auxiliary) model that is both analytically and 
computationally easier to deal with. Such an approach has been well 
explored in the classical literature but has received substantially less 
attention in the Bayesian paradigm. The purpose of this paper is to 
compare and contrast a collection of what we call parametric Bayesian 
indirect inference (pBII) methods. One class of pBII methods uses ap¬ 
proximate Bayesian computation (referred to here as ABC II) where 
the summary statistic is formed on the basis of the auxiliary model, 
using ideas from II. Another approach proposed in the literature, re¬ 
ferred to here as parametric Bayesian indirect likelihood (pBIL), uses 
the auxiliary likelihood as a replacement to the intractable likelihood. 
We show that pBIL is a fundamentally different approach to ABC II. 
We devise new theoretical results for pBIL to give extra insights into its 
behaviour and also its differences with ABC II. Furthermore, we exam¬ 
ine in more detail the assumptions required to use each pBII method. 
The results, insights and comparisons developed in this paper are il¬ 
lustrated on simple examples and two other substantive applications. 
The first of the substantive examples involves performing inference for 
complex quantile distributions based on simulated data while the sec¬ 
ond is for estimating the parameters of a trivariate stochastic process 
describing the evolution of macroparasites within a host based on real 
data. We create a novel framework called Bayesian indirect likelihood 
(BIL) that encompasses pBII as well as general ABC methods so that 
the connections between the methods can be established. 

Key words and phrases: Approximate Bayesian computation, likeli¬ 
hood-free methods, Markov jump processes, quantile distributions, sim¬ 
ulated likelihood. 
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Approximate Bayesian computation (ABC) now 
plays an important role in performing (approximate) 
Bayesian inference for the parameter of a proposed 
statistical model (called the generative model here) 
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that has an intractable likelihood. Despite the in¬ 
tense attention ABC has recently received, the ap¬ 
proach still suffers from several drawbacks. An ob¬ 
vious disadvantage is the usual necessity to reduce 
the data to a low dimensional summary statistic. 
This leads to a loss of information that is difficult 
to quantify. The second, often less severe but some¬ 
times related, drawback is the computational chal¬ 
lenge of achieving stringent matching between the 
observed and simulated summary statistics. 

In situations where an alternative parametric 
model (referred to as an auxiliary model) can be for¬ 
mulated that has a tractable likelihood, the method¬ 
ology known as indirect inference (II) (see, e.g., 
Gourieroux, Monfort and Renault, 1993 and Heg- 
gland and Frigessi (2004)) is applicable. II has 
been thoroughly examined in the classical frame¬ 
work. Most methods differ in the way that observed 
and simulated data are compared via the auxil¬ 
iary model. We expand on this later in the article. 
For the moment, we note that some key references 
are Gourieroux, Monfort and Renault (1993), Smith 
(1993) and Gallant and Tauchen (1996). 

However, II has been far less studied in the 
Bayesian paradigm. Drovandi, Pettitt and Faddy 
(2011) developed an ABC approach that uses II to 
obtain summary statistics. In particular, the esti¬ 
mated parameter of the auxiliary model fitted to 
the data becomes the observed summary statistic. 
We adopt a similar naming convention to Gleim 
and Pigorsch (2013) and refer to this method as 
ABC IP where “I” stands for “indirect” and “P” 
stands for “parameter”. Gleim and Pigorsch (2013) 
also list another method, ABC IL (where “L” stands 
for “likelihood”), which is essentially an ABC ver¬ 
sion of Smith (1993). This approach follows ABC IP 
in the sense that the parameter estimate of the aux¬ 
iliary model is again the summary statistic. How¬ 
ever, the ABC discrepancy is based on the auxiliary 
likelihood, rather than a direct comparison of the 
auxiliary parameters. 

Gleim and Pigorsch (2013) advocate a slightly dif¬ 
ferent approach to ABC with II, which is effectively 
an ABC version of the classical approach in Gal¬ 
lant and Tauchen (1996). Here, Gleim and Pigorsch 
(2013) use the score vector based on the auxiliary 
model as the summary statistic, which is referred to 
as ABC IS (here “S” stands for “score”). The pa¬ 
rameter value used in the score is given by the MLE 
of the auxiliary model fitted to the observed data. 


This approach can be far cheaper from a compu¬ 
tational point of view since it avoids an expensive 
fitting of the auxiliary model to each dataset sim¬ 
ulated from the generative model required in ABC 
IP and ABC IL. 

Throughout the paper, the collection of ap¬ 
proaches that use the parametric auxiliary model 
to form summary statistics is referred to as ABC II 
methods. An advantage of this approach over more 
traditional summary statistics is that some insight 
can be gained on the potential utility of the II sum¬ 
mary statistic prior to the ABC analysis. Addition¬ 
ally, if the auxiliary model is parsimonious, then the 
summary statistic can be low-dimensional. 

Gallant and McCulloch (2009) (see also Reeves 
and Pettitt (2005)) suggest an alternative approach 
for combining II with Bayesian inference. This 
method has similar steps to ABC IP and ABC IL 
but essentially uses the likelihood of the auxiliary 
model as a replacement to the intractable generative 
model likelihood. We note here that this is a funda¬ 
mentally different approach as it is not a standard 
ABC method. In particular, there is no comparison 
of summary statistics and no need to choose an ABC 
tolerance. Here, we refer to this method as paramet¬ 
ric Bayesian indirect likelihood (pBIL). The focus of 
this paper is the application of a parametric auxil¬ 
iary model for the full data, which we refer to as 
pdBIL (where “d” stands for “data”). However, the 
ideas in this paper are equally applicable if a para¬ 
metric model is applied at the summary statistic 
(not necessarily obtained using ABC II techniques) 
level (i.e., some data reduction technique has been 
applied; see Blum et al. (2013), for a review). This 
is referred to as psBIL (where “s” stands for “sum¬ 
mary statistic”). We show that the Bayesian version 
of the synthetic likelihood method of Wood (2010) 
is a psBIL method. In the paper, we refer to the 
collection of ABC II and pBIL approaches as pBII 
methods (“Bayesian” “Indirect” “Inference” using a 
“parametric” auxiliary model). 

In the process of reviewing these pBII methods, 
we create a novel framework called Bayesian indirect 
likelihood (BIL) which encompasses pBII as well as 
ABC methods generally. In particular, if a specific 
nonparametric auxiliary model is selected (npBIL) 
instead of a parametric one (pBIL), then the general 
ABC method is recovered. A nonparametric kernel 
can be applied either at the full data (npdBIL) or 
summary statistic (npsBIL) level. The ABC II ap¬ 
proaches are thus a special type of npsBIL method 
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Fig. 1 . The general BIL framework. Here, the rounded rectangles indicate particular instances of methods. The dashed arrows 
indicate that there are multiple instances of that class of method in the literature that are not placed on this diagram. The 
dashed larger rounded rectangles indicate the methods that this paper focusses on. That is, BII methods that make use of, in 
some way, a parametric auxiliary model (so-called pBII methods). 


where the summary statistic is formed on the basis 
of a parametric auxiliary model. This framework is 
shown in Figure 1, which also highlights the meth¬ 
ods that this paper addresses. 

This article does not develop any new algorithms 
for pBII. However, this paper does make several in¬ 
teresting and useful contributions. Firstly, we ex¬ 
plore the pdBIL method in more detail theoretically, 
and recognise that it is fundamentally different to 
ABC II. The behaviour of this method is also sub¬ 
stantially different. A technique sometimes applied 
with classical II methods is to increase the simulated 
dataset size beyond that of the observed data in or¬ 
der to reduce the variability of the estimated quan¬ 
tities under the auxiliary model (see, e.g., Smith 
(1993); Gourieroux, Monfort and Renault (1993); 
Gallant and McCulloch (2009)). We demonstrate 
that pdBIL and ABC II behave differently for in¬ 
creasing size of the simulated data. Our second con¬ 
tribution is to compare the assumptions required for 


each pBII approach. Our theoretical and empirical 
results indicate that the pBIL method will provide 
good approximations if the auxiliary model is suffi¬ 
ciently flexible to estimate the likelihood of the true 
model well based on parameter values within re¬ 
gions of non-negligible posterior probability. ABC II 
methods rely on the parameter estimate or the score 
of the auxiliary model to provide a near-sufficient 
summary statistic. Finally, our creation of the gen¬ 
eral BIL framework provides a clear way to see the 
connections between pBII and other methods. 

The paper is organised as follows. In Section 2, 
the notation used throughout the paper is defined. 
The ABC II methods are reviewed in Section 3. 
The pBIL approach is presented in Section 4. The 
theoretical developments in this section, which of¬ 
fer additional insight into the pBIL approximation, 
are new. In addition, this section demonstrates how 
the synthetic likelihood approach of Wood (2010) 
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is a pBIL method on the summary statistic level. 
Section 5 shows how ABC can be recovered as a 
BIL method via a nonparametric choice of the auxil¬ 
iary model. Section 6 provides a comparison between 
ABC II and pdBIL. The contributions of this article 
are demonstrated on examples with varying com¬ 
plexity in Section 7. The highlights of this section 
include improved approximate inferences for quan¬ 
tile distributions and a multivariate Markov jump 
process explaining the evolution of parasites within 
a host. The article concludes with a discussion in 
Section 8. 

2. NOTATION 

Consider an observed dataset y taking values in 
Y of dimension N assumed to have arisen from 
a generative model with an intractable likelihood 
p(y\9), where 6 € © is the parameter of this model. 
Intractability here refers to the inability to com¬ 
pute p(y\6) pointwise as a function of 0. We as¬ 
sume that there is a second statistical model that 
has a tractable likelihood function. We denote the 
likelihood of this auxiliary model by fL4(y|0)> where 
denotes the parameter of this auxiliary model. 
There does not necessarily need to be any obvious 
connection between 6 and 0. The auxiliary model 
could be purely a data analytic model that does not 
offer any mechanistic explanation of how the ob¬ 
served data arose. The parameter estimate of the 
auxiliary model when fitted to the observed data is 
given by 0(y). Assuming a prior distribution on the 
parameters of the generative model, p(0), our inter¬ 
est is in sampling from the posterior distribution, 
p{6 1y), or some approximation thereof. 

We denote data simulated from the model as 
xsY. In this paper, we also consider the effect of us¬ 
ing n independent replicates of data simulated from 
the model, which we denote xi :n = (xi,..., x n ) and 
define p(xi :n |0) = nr=iP( x *l^)- Therefore the total 
size of xi :n is nN. Note that this could also relate 
to a stationary time series simulation of length nN 
(see, e.g., Gallant and McCulloch (2009)). In the 
case of a stationary time series or independent and 
identically distributed (i.i.d.) data it is not a require¬ 
ment for the simulated dataset size to be a multiple 
of the observed data size. However, for the sake of 
simplicity we restrict n to be a positive integer. 

Since the likelihood of the auxiliary model is 
tractable, we can potentially consider richly param- 
eterised statistical models to capture the essential 


features of the data. We assume throughout the pa¬ 
per that the dimensionality of the auxiliary model 
parameter is at least as large as the dimension¬ 
ality of the generative model parameter, that is, 
dim(0) > dim(0). Jiang and Turnbull (2004) note 
that it still may be possible to obtain useful esti¬ 
mates of a subset of the parameters when this as¬ 
sumption does not hold. We do not consider this 
here. 

3. APPROXIMATE BAYESIAN 
COMPUTATION WITH INDIRECT 
INFERENCE 

3.1 Approximate Bayesian Computation 

ABC is widely becoming a standard tool for per¬ 
forming (approximate) Bayesian inference on statis¬ 
tical models with computationally intractable like¬ 
lihood evaluations but where simulation is straight¬ 
forward. ABC analyses set n = 1 so that the simu¬ 
lated dataset is the same size as the observed. How¬ 
ever, for the purposes of this paper, we relax this 
commonly applied restriction for the moment. 

In ABC, a summary statistic is defined by a col¬ 
lection of functions s n : Y n -A S, for each n € N. 
Henceforth, the subscript n is omitted to ease pre¬ 
sentation. Proposed parameter values that produce 
a simulated summary statistic, s(xi :n ), “close” to 
the observed summary statistic, s(y), are given 
more weight. Here, we define “close” by the discrep¬ 
ancy function p(s(xi :n ), s(y)) and a kernel weight¬ 
ing function, JC £ (/?(s(xi :n ), s(y))), where £ is a band¬ 
width referred to as the ABC tolerance. The ABC 
target distribution is given by 

(1) p £;n (6>|y) ocp(0)p £ , n (y|0), 

where 

Pe,n{y\9)oc / p(xi :n \d)K £ (p(s(x 1:n ),s(y)))dx 1:n , 

J Y" 

is referred to here as the ABC likelihood. It can be 
shown that if s(-) is a sufficient statistic and n = 1 
then p £ ,i(0|y) -A p{0\y) as e -a 0 (Blum, 2010). 
Unfortunately, ABC cannot be trusted when the 
value of n is increased in the sense that the target 
Pe,n(9\y) can move further away from p{6 |y). In the 
simple example in Appendix A of the supplemental 
article (Drovandi, Pettitt and Lee, 2014), the pos¬ 
terior distribution for a univariate 9 converges to a 
point mass centred on the single observation y as 
n -A oo and e -A 0 (see also Drovandi (2012), pages 
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28-29, for a similar example). We also verify this be¬ 
haviour empirically on a toy example. This suggests 
that whilst it is tempting to increase the simulated 
dataset size to reduce the variability of the simulated 
summary statistic, such an approach is fraught with 
danger. 

Standard ABC procedures can make use of n sim¬ 
ulated datasets but in a different way. Here the ABC 
likelihood is estimated via n -1 -Ke(p(s(xi), 
s(y))). However, we note that this is an unbiased 
estimate (regardless of n) of a more standard ABC 
likelihood, and thus this process does not alter the 
ABC approximation (Andrieu and Roberts, 2009). 
Therefore, for the remaining presentation in this 
section we set n = 1. However, increasing n may im¬ 
prove the performance of ABC algorithms and allow 
a smaller value of e for the same computational cost. 
We do not investigate this here. 

How the parameter value is proposed depends on 
the chosen ABC algorithm. Here, we use Markov 
chain Monte Carlo (MCMC ABC, Marjoram et al. 
(2003)) with the proposal distribution q chosen care¬ 
fully, and sometimes based on previous ABC analy¬ 
ses. The approach is shown in Algorithm 1 for com¬ 
pleteness. In the algorithm, T is the number of itera¬ 
tions. We choose 0°, the initial value of the chain, so 
that it is well supported by the target distribution. 
This initial value may come from a previous analysis 
or by performing some preliminary runs. This algo¬ 
rithm is mainly used for simplicity, although it is 
important to note that other ABC algorithms could 
be applied. 

One difficult aspect in ABC is that often some 
form of data reduction is required, to avoid the curse 
of dimensionality associated with comparing all the 


data at once (Blum, 2010). This choice of summary 
statistic is therefore crucial for a good approxima¬ 
tion, even when e can be reduced to practically 0. 
In some applications, it is possible to propose an¬ 
other model that provides a good description of the 
data. The summary statistic used in ABC can be 
formulated based on this auxiliary model. These ap¬ 
proaches are summarised below. 

3.2 ABC IP 

Drovandi, Pettitt and Faddy (2011) suggest using 
the parameter estimate of the auxiliary model as 
the summary statistic to use in ABC. Data are simu¬ 
lated from the generative model based on a proposed 
parameter value, then the auxiliary model parame¬ 
ter is estimated based on this simulated data. The 
way the auxiliary parameter is estimated provides a 
mapping between the generative model and the aux¬ 
iliary model parameters. The ABC algorithm uses a 
noisy mapping between 0 and 0 through the simu¬ 
lated data, x, generated on the basis of 0, 0(0,x). 
For this purpose (explained below), we use the max¬ 
imum likelihood estimate (MLE) of the auxiliary 
model 

0(0, x) =argmaxpA(x|0), 

0e<i> 

where x~p(-|0). ABC IP relies on the following 
assumption: 

Assumption 1 (ABC IP Assumptions). The 
estimator of the auxiliary parameter, 0(0, x), is 
unique for all 0 with positive prior support. 

It is important to note that ABC IP (as well 
as other ABC approaches below) use n = 1 so the 


Algorithm 1 MCMC ABC algorithm of Marjoram et al. (2003). 


Set 0° 

Simulate x° ~p(-|0°) 
for i = 1 to T do 

Draw 0* ~ g(-|0' _1 ) 
Simulate x* ~p(-|0* 

Compute r = min(l, 


if uniform(0,1) < r then 


p(0«) g (0 < - 1 |e«)^ e (p(s(x»), a (y))) x 
p(e- 1 ) 9 (0*|0*- 1 )Jf e (p(s(x‘-i), S (y)))^ 


8 

9 

10 

11 

12 


6 l = 0* and x* = x* 
else 

0* = 0 i_1 and x ?: = x*" 1 

end if 
end for 
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approximation quality of the method can depend 
on the statistical efficiency of the estimator 
based on this finite sample. Additionally, the MLE is 
asymptotically sufficient (Cox and Hinkley (1979), 
page 307). For this reason, we advocate the use of 
the MLE in general as it is typically more efficient 
than other estimators like sample moments (for the 
auxiliary model). Sample moments may be compu¬ 
tationally easier to obtain, but are likely to result 
in a poorer ABC approximation if the statistical ef¬ 
ficiency is lower than the MLE. We note that the 
optimal choice of auxiliary estimator (trading off be¬ 
tween computational effort and statistical efficiency) 
may be problem dependent. An additional compli¬ 
cation is that the auxiliary model is fitted based on 
data generated from a different model, x r\_; PM#)- 
Therefore, the efficiency of 0(x, 0) should be based 
on p(x \0) not p^(x \(f>) (see, e.g., Cox (1961)). This 
can be investigated by simulation. 

In Section 7.2, we provide an example where the 
auxiliary model does not satisfy Assumption 1, cre¬ 
ating difficulties for ABC IP. The ABC II methods 
(and the pdBIL method) that follow do not neces¬ 
sarily require unique auxiliary parameter estimates. 

An advantage of the II approach to obtaining sum¬ 
mary statistics is that the summary statistic will 
likely be useful if the auxiliary model fits the ob¬ 
served data (see Section 3.5 for more discussion). In 
the case of ABC IP, the (approximate) covariance 
matrix of the auxiliary parameter estimate based on 
the observed data can be estimated by the inverse of 
the observed information matrix [we denote this in¬ 
formation matrix by J(^>(y))]. Intuitively, we expect 
this discrepancy function to be more efficient than 
a Euclidean distance, as it can take into account 
the variability of summary statistics and the cor¬ 
relations between summary statistics. Denoting the 
observed summary statistic as 0(y) and the sim¬ 
ulated summary statistic as 0(x) (dropping 6 for 
notational convenience), we use the following dis¬ 
crepancy for ABC IP: 

p(s(x),s(y)) 

= \!(</>(*) - 0(y)) T J(0(y))(^(x) - <f>{ y)). 

It is important to note that this is essentially an 
ABC version of the classical approach in Gourier- 
oux, Monfort and Renault (1993). A more appro¬ 
priate weighting matrix may involve considering the 
variance of 0(x, 6) when the data are generated un¬ 
der an alternative model, x~p(-|0) (Cox (1961) 
provides a result, the so-called sandwich estimator). 


3.3 ABC IL 

Gleim and Pigorsch (2013) also describe an ap¬ 
proach that uses the auxiliary likelihood to set up 
an ABC discrepancy. Here, the ABC discrepancy is 

p( s ( x )> s (y)) = i°gPA(y|0(y)) -iogp j4 (y|0(x)). 

This is effectively an ABC version of the clas¬ 
sical approach of Smith (1993). We note that 
Pa( y|<£(y)) w iH remain unchanged throughout the 
algorithm and provides an upperbound for val¬ 
ues of PA(y|</>( x )) obtained for every simulated 
dataset. ABC IL uses the same summary statistic 
as ABC IP but uses a discrepancy based on the 
likelihood rather than the Mahalanobis distance. 
Note that the discrepancy function for ABC IP ap¬ 
pears in the second-order Taylor series approxima¬ 
tion of logpA(y|0(x)) about logpA(y|0(y)) (Davi¬ 
son (2003), page 126) assuming standard regular¬ 
ity conditions for PA(y\4>{y)) and 4>{ y)- The ABC 
tolerance could be viewed here as a certain cut-off 
value of the auxiliary log-likelihood. The ABC IL 
approach relies on the following assumption. 

Assumption 2 (ABC IL Assumptions). The 
auxiliary likelihood evaluated at the auxiliary esti¬ 
mate, PA(y|0(x, 0)), is unique for all 0 with positive 
prior support. 

We note that this assumption can still be satis¬ 
fied even when the auxiliary model does not have a 
unique MLE (see Section 7.2 for an example). 

The ABC IP and ABC IL methods use parameter 
estimates of the auxiliary model as summary statis¬ 
tics and can thus be expensive as it can involve a 
numerical optimisation every time data is simulated 
from the generative model. The next approach to 
obtaining summary statistics from II avoids this op¬ 
timisation step. 

3.4 ABC IS 

Gleim and Pigorsch (2013) advocate the use of the 
score vector of the auxiliary model evaluated at the 
auxiliary MLE, 0(y), as the summary statistic. We 
denote the score vector of the auxiliary model as 

c i aa ( dlo &'PA{y\4>) dlogp A (y\4>)\ T 

V d(pi ) 

where (f> = {<j >\,..., 0 d i m (</>) ) T • 

Each component of the summary statistic involv¬ 
ing the observed data and the MLE, Syi(y, 4>{ y)), is 
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assumed to be numerically 0 under standard regu¬ 
larity assumptions (see below, Assumption 3). Thus, 
the search is for parameter values of the generative 
model that lead to simulated data, x, that produces 
a score close to 0. Noting that the approximate co- 
variance matrix of the observed score is given by the 
observed information matrix J(<^(y)), the following 
ABC discrepancy is obtained for ABC IS 

p(s(x),s(y)) 

= \/sA(x,(/)(y)) i J(0(y))' 1 S y i(x,0(y)). 

This is essentially an ABC version of Gallant and 
Tauchen (1996). 

This approach is fast relative to ABC IP when the 
MLE of the auxiliary model is not analytic whilst 
the score is analytic since no numerical optimisation 
is required every time data are simulated from the 
generative model. Of course, it may be necessary 
to estimate the score numerically, which would add 
another layer of approximation and may be slower. 
In the examples of this paper, we are able to obtain 
the score analytically. ABC IS relies on the following 
assumptions. 

Assumption 3 (ABC IS Assumptions). The 
MLE of the auxiliary model fitted to the observed 
data, </>(y), is an interior point of the parameter 
space of 4> and J(</>(y)) is positive definite. The 
log-likelihood of the auxiliary model, logpA( , | < ^); is 
differentiable and the score, Sa(x, 0(y)), is unique 
for any x that may be drawn according to any 6 
that has positive prior support. 

We note that Assumption 3 is generally weaker 
than Assumption 1 (ABC IP), since it may still hold 
even if the MLE of the auxiliary model is not unique. 

3.5 Discussion on ABC II Summary Statistics 

Only models in the exponential family possess a 
minimal sufficient statistic with dimension equal to 
that of dim(0). For other models, under suitable 
conditions, the Pitman-Koopman-Darmois theo¬ 
rem states that the dimension of any sufficient 
statistic increases with the sample size. For many 
complex models, such as those considered in the 
ABC setting, the minimal sufficient statistic will be 
the full dataset (or the full set of order statistics if 
the data are i.i.d.). The summary statistic produced 
by ABC II will always have dimension dim(0), and 
thus will not produce sufficient statistics in general 
(this argument of course carries over to any ABC 


method that uses some data reduction technique; 
see Blum et al. (2013), for a review). Intuitively, our 
suggestion is that the summary statistic produced 
by ABC II should carry most of the information 
contained in the observed data provided that the 
auxiliary model provides a good description of the 
data. Unfortunately, this is difficult to verify since 
it is usually not possible to quantify the amount 
of information lost in data reduction. Despite this, 
by conducting goodness-of-fit tests and/or residual 
analysis on the auxiliary model fit to the data will 
at least provide some guidance on the usefulness of 
the summary statistic produced by ABC II. This 
is in contrast to the more traditional approach of 
summarising based on simple functions of the data 
(e.g., Drovandi and Pettitt (2011)), whose utility is 
difficult to assess prior to running an ABC analysis 
without performing an expensive simulation study. 
Furthermore, ABC II methods provide natural dis¬ 
crepancy functions between summary statistics as 
shown above. Selecting the discrepancy function and 
determining appropriate weighting of the summary 
statistics in traditional ABC can be problematic. 

It is well known that the choice of summary 
statistic in ABC involves a compromise between 
sufficiency and dimensionality (Blum et al., 2013). 
A low-dimensional and near-sufficient summary 
statistic represents an optimal trade-off. Another 
advantage of ABC II over usual ABC is the dimen¬ 
sionality of the ABC II summary statistic can be 
controlled by selecting parsimonious auxiliary mod¬ 
els and using standard model choice techniques to 
choose between a set of possible auxiliary models 
[e.g., the Akaike information criterion (AIC) and 
the Bayesian information criterion (BIC)]. 

4. PARAMETRIC BAYESIAN INDIRECT 
LIKELIHOOD (pBIL) 

4.1 Parametric Bayesian Indirect Likelihood for 
the Full Data (pdBIL) 

Reeves and Pettitt (2005) and Gallant and Mc¬ 
Culloch (2009) propose a method that has similar 
steps to ABC IP and ABC IL but is theoretically 
quite different, as we show below. After data are 
simulated from the generative model, the auxiliary 
parameter is estimated. This auxiliary estimate is 
then passed into the auxiliary likelihood of the ob¬ 
served data. This likelihood is then treated in the 
usual way and fed into a Bayesian algorithm, for 
example, MCMC. One first defines a collection of 
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functions </> n : 0 x Y n -a 4>. The artificial likelihood 
is then defined as follows: 

~ n 

PA,n( y|0)= / PA{y\4>n{ d ^l:n))\\p{^i\0)dyL V , n , 

l\/n 

J Y i=1 

and the target distribution of this approach is given 
by 

PA,n(°\y) <*-PA,n(y\0)p(0), 

where the subscripts A and n denote the depen¬ 
dence of the target on the auxiliary model choice 
and the number of replicate simulated datasets, re¬ 
spectively. This approach is effectively a Bayesian 
version of the simulated quasi-maximum likelihood 
approach of Smith (1993). Smith (1993) proposes 
to maximise PA,n( y|0) with respect to 6. Instead 
of applying this as an ABC discrepancy as in the 
ABC IL method above, Reeves and Pettitt (2005) 
and Gallant and McCulloch (2009) treat this aux¬ 
iliary likelihood as a replacement to the likelihood 
of the generative model in the same way that Smith 
(1993) does. 

It is important to note that this approach does 
not perform a comparison of summary statistics, 
and hence there is no need to choose an ABC tol¬ 
erance. Thus, it is not a standard ABC algorithm. 
We refer to this approach simply as pdBIL, since 
we apply a “parametric” auxiliary model for the full 
“data”. When the full data have been summarised as 
a summary statistic, s(y), an alternative approach is 
to apply a parametric auxiliary model for the sum¬ 
mary statistic likelihood, p(s(y)|0) (see Section 4.2 
for more details). In Figure 1, these methods fall un¬ 
der the pBIL class. Within this class, the focus of 
this paper is on the pdBIL method. 

4.1.1 The pdBIL approximation The theoretical 
aspects of this approach are yet to be investigated in 
the literature. Some clues are offered in Reeves and 
Pettitt (2005) and Gallant and McCulloch (2009), 
but we formalise and extend the theory here. The 
subscript n is used to denote that this target remains 
an approximation to the true posterior distribution 
and that the approximate target may change with 
n (we show below that in general the target does 
depend on n). However, because it is not an ABC 
algorithm, it is unclear how pdBIL behaves as n in¬ 
creases. Gallant and McCulloch (2009) use a very 
large simulation size (n ~ 700), without a theoreti¬ 
cal investigation. 


With y fixed, we consider a potential limiting like¬ 
lihood PA(y\4>{@)) with associated posterior 

PA(0\y) <xp A (y\(f>(0))p(0). 

Note that the parameter of PA,n( y|0) is 0 € 0 but 
the parameter of PA(y\<t>(0)) is (f)(0) € 4). We em¬ 
phasise that the results below all assume that y is 
a fixed value in Y. 

To ease presentation, we define the random vari¬ 
able <f) g n = <f> n (0, Xi :n ) where (Xj) is a sequence 
of i.i.d. random variables distributed according to 
p(-\0), and we can write 

PA,n(y\ d ) = E[PA(y\</)0,n)\, 

where E is expectation with respect to the distribu¬ 
tion of Xi :n . 

The results below provide sufficient conditions un¬ 
der which, as nA oo, PA,n(- |y) -A-PA(-\y) pointwise 
and 

( f(0)pA,n( e \y) de ^ f f(0)PA(0\y)d0, 

Je Jo 

where / is some function of 6 whose posterior ex¬ 
pectation is of interest. This does not assume that 
PA(-\y) =p(-|y), which in general will not be the 
case. A useful tool to allow us to answer both ques¬ 
tions is provided by Billingsley (1999). 

Theorem 1 (Billingsley (1999), Theorem 3.5). 
If X n is a sequence of uniformly integrable random 
variables and X n converges in distribution to X then 
X is integrable and EX n -A EX. 

Remark 1. A simple sufficient condition for 
uniform integrability is that for some <5 > 0 , 

sup E(|X n | 1+ ' 5 ) < oo. 

n 

Result 1. Assume that pa, re (y|0) -a PA(y\4>(0)) 
as nAoo for all 6 with positive prior support, 
inf„ J e p A ,n(y\0)p(0)d0 > 0 and sup 0 6 $pA(y|</>) < 
oo. Then 

lim pA,n(0\y) =pa(0 |y). 

n —>-oo 

Furthermore, if /: 0 -a K is a continuous func¬ 
tion satisfying sup n f Q \f(0)\ 1+s pA,n(0\y) dO < oo 
for some 5 > 0 then 

lim f f(0)p A , n (0\y)d0=[ f(0)p A (0\y)d0. 

n ^°°Jo JO 
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Proof. The first part follows from the fact that 
the numerator of 


PA,n(0 ly) 


PA,n(y\ d )P( d ) 

J e PA,n(y\9)p(9)d9 


converges pointwise and the denominator is positive 
and converges by the bounded convergence theorem. 
For the second part, if for each n € N, 9 n is dis¬ 
tributed according to PA,n(- |y) and 0 is distributed 
according to PA(-\y) then 9 n converges to 0 in distri¬ 
bution as n -A oo by Scheffe’s lemma (Scheffe, 1947). 
Since / is continuous, f(6 n ) converges in distribu¬ 
tion to f(9) as n A oo by the continuous mapping 
theorem and we conclude by application of Theo¬ 
rem 1 . □ 


A simple condition for PA,n(y\9) PA(y\4>(9)) as 

n -A- oo to hold is provided by the following result. 

Result 2. Assume that PA(y\4>e n) converges in 
probability to pA(y\4>(9)) as n—>• oo. If 

supE[|p j4 (y|^e,n)| 1+ 1 <oo 

n 

for some 6 > 0 then pa,u( y|0) —>• pa( y|0(0)) as nA 
oo. 


Proof. The result follows by application of 
Theorem 1. □ 


Although the results above hold under conditions 
on the fixed, observed data y, they will often hold 
for a range of possible values of y. 

The function (f)(9) is often referred to as the map¬ 
ping or binding function in the II literature. In Gal¬ 
lant and McCulloch (2009), it is assumed that this 
function is 1-1 but the results above demonstrate 
that this is not a necessary condition for Result 2 
to hold. The following example where the auxiliary 
model is a mixture model demonstrates this princi¬ 
ple. 

Assume that the true model is IV (y; 9, 1) while the 
auxiliary model is a mixture of normals, wN(y,8\, 
1) + (1 — w)N(y\ 02, 1) so that <f> = (9\, 02, w). As¬ 
suming an infinite sample from the true model, there 
are an infinite number of MLEs of the auxiliary 
model; (f)(6) = (9,6,w) where 0 < w < 1 , (f)(9) = 
( 9 , 9 2,1) where —oo < 82 < 00 or 0(0) = (0i,0,O) 
where — 00 < 9\ < 00 . All of these possible mappings 
produce the same value of the auxiliary likelihood, 
which coincides with the value of the generative like¬ 
lihood. 

It is straightforward under the assumptions of Re¬ 
sults 1-2 to show that pdBIL will target the true 


posterior as n -A 00 if the true model is contained 
within the auxiliary model. When the generative 
model is a special case of the auxiliary model, us¬ 
ing the notation of Cox and Wermuth (1990), the 
auxiliary and generative parameter can be written 
as <f> = (9 e , 7 ) and 9 = (0 r ,7o) where e and r de¬ 
note “extended” and “reduced” respectively and 7 0 
is fixed. The proof of this result is straightforward. It 
involves demonstrating that 0(0,xi :n ) is consistent 
also for the parameter of the generative (reduced) 
model. Therefore, when n —> 00 the generative and 
auxiliary likelihoods will coincide. 

This theoretical result cannot typically be realised 
in practice since a model which incorporates an in¬ 
tractable model as a special case is likely to also 
be intractable. However, it does suggest that the 
auxiliary model be chosen to be adequately flexi¬ 
ble to give a good approximation of the generative 
likelihood for 9 values with positive prior support. 
In practice, our empirical evidence indicates that 
it is only necessary for the auxiliary likelihood to 
mimic the behaviour of the generative likelihood for 
the values of 9 with nonnegligible posterior support 
and for the auxiliary likelihood to be negligible in 
regions of the parameter space with little posterior 
support. If this is not the case, it is likely that the 
pdBIL method will lead to poor approximations (as 
we demonstrate in Section 7.1). 

If the binding function, (f)(9), were known, the 
pdBIL method would proceed straightforwardly. 
Since it will not be available in practice, it can be es¬ 
timated indirectly through the simulated data xi :n . 
From the above, it is desirable for the pdBIL method 
if the auxiliary likelihood is as close as possible to 
the true likelihood. Gallant and McCulloch (2009) 
show, for a particular choice of the auxiliary model, 
that choosing the MLE for 0(0, xi :n ) minimises the 
Kullback-Leibler divergence between the generative 
and auxiliary likelihoods. Furthermore, this choice 
will often lead to Results 1-2 holding. Therefore, 
we advocate the use of the MLE with this method. 
When the MLE of the auxiliary model is used, Cox 
(1961) provides an expression which defines 0(0) 
[see equation (25) of Cox (1961)]. 

It remains to be seen what the target of p.A,n(0|y) 
is relative to pa(9 |y). From an intuitive perspec¬ 
tive, increasing n leads to a more precise deter¬ 
mination of the mapping 0(0), and thus should 
lead to a better approximation. A more theoreti¬ 
cal argument is as follows. The approximations will 
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coincide with each other provided E\pA(y\<j>g n )\ = 
Pa{ y|0(0)) (Andrieu and Roberts, 2009). Unfortu¬ 
nately, even if one uses an unbiased estimator for the 
auxiliary parameter, that is E[0 e n ] = 4>{9) for any 
value of n, this result will still rarely hold in general. 
The likelihood can be viewed here as a nonlinear 
function of the auxiliary parameter estimate, and so 
E\pA{y\4>o,n)} ^ PA{y\<f>{0)) in general. Our empiri¬ 
cal evidence (see Section 7) suggests that pA,n{6 |y) 
typically becomes less precise relative to pa{6 |y) as 
the likelihood estimate becomes more noisy, that is, 
when n is reduced. The message of Results 1-2 is 
that, provided the auxiliary model is suitably cho¬ 
sen, a better approximation can be anticipated by 
taking n as large as possible, which has the effect of 
reducing the bias oi pA{y\4>o n)- 

4.1.2 MCMC pdBIL As an example, MCMC can 
be used to sample from the pdBIL target (referred to 
here as MCMC pdBIL, see Gallant and McCulloch 
(2009)). This approach is presented in Algorithm 2. 

As Results 1-2 suggest, the aim with pdBIL is to 
select n as large as possible. We demonstrate in the 
examples that it is desirable to consider values of 
n greater than one due to the improved statistical 
efficiency of MCMC pdBIL (and potentially other 
algorithms that implement pdBIL) when increasing 
n. Of course, the method will become computation¬ 
ally infeasible for very large n. 


4.2 Parametric Bayesian Indirect Likelihood for 
a Summary Statistic (psBIL) 

Wood (2010) proposes a method called synthetic 
likelihood when it is convenient to perform inference 
on the basis of a set of summary statistics rather 
than the full data. Considering Bayesian inference, 
the target distribution when the data have been re¬ 
duced to a summary statistic is given by 

p{6\s(y)) (xp(s(y)\0)p{6). 

The major issue with this construction is that there 
is no analytical form for the likelihood function 
of the summary statistic, p(s(y)|0). Wood (2010) 
overcomes this by applying, based on the termi¬ 
nology in this paper, a parametric auxiliary model 
for this probability distribution, PA(s(y)\<j>(0))- In 
our framework, an approach that applies a para¬ 
metric auxiliary model to form the likelihood of the 
summary statistic rather than the likelihood of the 
full data (as is presented in Section 4.1) is referred 
to here as psBIL (where “s” denotes “summary 
statistic”). Therefore the Bayesian version of Wood 
(2010) is a psBIL approach. For the auxiliary like¬ 
lihood, PA(s{y)\^>(6)), Wood (2010) considers us¬ 
ing the likelihood of a multivariate normal distribu¬ 
tion, S(0)), where 0(0) = (/r(0),X(0)) is 

the auxiliary parameter. As is the case with pdBIL, 
the binding function 0(0) is rarely known but can 
be estimated via simulated data from the generative 
model for a particular value of 0. Using our nota¬ 
tion, we obtain the following dataset from the true 


Algorithm 2 MCMC pdBIL algorithm (see also Gallant and McCulloch (2009)). 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 


Set 0° 

Simulate x*. n ~p(-|0°) 

Compute 0° = argmax0 g <j>p J 4(xi;ril0) 

for i = 1 to T do 

Draw 0* ~ g(-|0' _1 ) 

Simulate x*. n ~ p(-\6*) 

Compute 0(xj\ n ) = argmax0p j4 (x*. ri |0) 

Cnmnutr r — miufl 1 0 *)' 

Compute r - mm(l, PA{y w-i )p{g i-i )q{e *\ e i-i), 

if uniform(0, 1) < r then 

0 * = 9 * 

0* = 0( x l:n) 

else 

0 ' = 0*” 1 

0 * = 0- 1 

end if 
end for 
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model of the summary statistic, (s(xi),..., s(x n )). 
This represents n i.i.d. observations from s(-)|0. An 
advantage of selecting such a simple auxiliary model 
is that the MLE has the analytic form 

1 n 

Mx 1:n ,0) = -y' s ( x i)> 

n ' 
i=l 

1 n 

S(x 1:rl , 6 ) = - VVs(xj) - /i(xi -n, 0 )) 

n z —' 

i=l 

■ (s(xj) - A*(xi : „,0)) , 

where the superscript T denotes transpose. The aux¬ 
iliary likelihood used is then based on lV(/z(x i :n , 9), 
S(xi :n ,9)). Our results indicate that the target dis¬ 
tribution of this method will depend on re, and, if the 
auxiliary model for the summary statistic likelihood 
is reasonable, better approximations ofp(0|s(y)) are 
likely to be obtained for large re. 

5. ABC AS A BIL METHOD WITH 
NONPARAMETRIC AUXILIARY MODEL 

An alternative and perhaps natural candidate 
for pa is to use a kernel density estimate based 
on the samples xi :n . This corresponds to choosing 
0(0,xi :n ) = xi :n and we define 

1 n 

PA(y|0(0,xi :n )) =Pa( y|xi :n ) = -V'iC e (p(y,x i )), 

n z —' 

i=i 

(Diggle and Gratton, 1984), where e is the band¬ 
width parameter. We have then 

« n 

pa , u { y|0)= / PA(y|0(0,xi :n ))TTp(x J |0)dxi :n 

/ Yn 

3=1 

r y n n 

f=i 

- n r. n 

= -£/ -^e(p(y,xi))TTp(x i |0)dx 1:ri 

= f^K e (p(y,x))p(x.\6)dx 
=Ps(y\o), 

and this is exactly the form of the standard ABC 
likelihood. In addition, re does not affect the like¬ 
lihood (although it may help computationally in 


some algorithms) and e controls the level of ap¬ 
proximation. Here, we see that this is an esti¬ 
mate of the ABC likelihood where the compari¬ 
son is made between the full datasets. Here, we 
obtain the npdBIL approach as presented in Fig¬ 
ure 1 (where “d” corresponds to full “data”). Al¬ 
ternatively, a nonparametric density estimate of the 
auxiliary model of the summary statistic likelihood, 
PA(s(y)|0(xi :n ,0)), could be applied. Using a sim¬ 
ilar procedure to above, we obtain p £ (s(y)|0). We 
refer to this in Figure 1 as npsBIL (npBIL based 
on a “summary statistic”). This is the approach 
adopted by Creel and Kristensen (2013), however, 
their focus is on point estimation (posterior mean). 
Unfortunately, Creel and Kristensen (2013) refer to 
their Bayesian estimator as BIL, however, under our 
framework BIL is a more general class of methods. 
Of course, if the summary statistic is derived from 
some parametric auxiliary model, then the ABC II 
class of method is recovered as an npsBIL method. 
The reader is again referred to Figure 1 to see the 
connection between these methods. 

By selecting a parametric model for the auxil¬ 
iary likelihood (pBIL), we can potentially overcome 
the curse of dimensionality associated with the non¬ 
parametric aspect of ABC. This requires further re¬ 
search. Of course, finding a suitable parametric aux¬ 
iliary model may be challenging in practice. 

6. COMPARISON OF ABC II AND pdBIL 

There are a few remarks to be made about the 
above results in relation to theoretical comparisons 
between ABC II and pdBIL. 

Remark 2. Under suitable conditions better ap¬ 
proximations with pdBIL are obtained by increasing 
re. This is in stark contrast with ABC II, which can¬ 
not be trusted for re > 1. 

Remark 3. In the case where the true model 
is a special case of the auxiliary model, the pdBIL 
method will be exact in the limit as re —>• oo. In con¬ 
trast, in this ideal situation, ABC II still does not 
produce sufficient statistics (see the dimensionality 
argument in Section 3.5) and will not target the true 
posterior in the limit as e —> 0. An example is where 
the true model is a t-distribution with location, scale 
and degrees of freedom of p, a and 1, respectively. 
The auxiliary model is a more general t-distribution 
with degrees of freedom v. In this case, the pdBIL 
method is exact in the limit as re -> oo as the true 
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model is incorporated within the auxiliary model. 
Unfortunately, ABC II does not produce a suffi¬ 
cient statistic as the summary statistic will be of 
dimension three whilst it is known for this model 
the minimal sufficient statistic consists of all the or¬ 
der statistics. Of course, finding an auxiliary model 
that satisfies this condition in practice will rarely be 
feasible. 

Remark 4. Even if the auxiliary parameter es¬ 
timate or score happen to be a sufficient statistic 
for the generative model, pdBIL still will not gener¬ 
ally target the true posterior, as the auxiliary and 
generative likelihoods will still not match up. In this 
situation, the ABC II approaches will enjoy conver¬ 
gence to the true posterior as e A 0 whilst pdBIL 
will not converge to the true posterior as n —>■ oo. 
However, sufficient statistics are rarely achieved in 
practice. 

Remarks 3 and 4 demonstrate that pBII meth¬ 
ods generally will not (and rarely will) target the 
true posterior distribution asymptotically. This is 
generally the case for other techniques in the liter¬ 
ature for dealing with models that have intractable 
likelihood functions. There are some exceptions to 
this. For example, exact techniques are available 
for so-called doubly intractable models when per¬ 
fect simulation from the generative model is possible 
(e.g., Mpller et al. (2006); Murray, Ghahramani and 
MacKay (2006)). Furthermore, so-called pseudo¬ 
marginal methods (Andrieu and Roberts, 2009) are 
applicable when a positive and unbiased estimator 
of the likelihood is available and is a current area of 
research. Despite not being exact, we demonstrate 
that pBII methods can produce quite good approx¬ 
imations in some applications. 

The characteristics of a good auxiliary model dif¬ 
fer between the ABC II and pdBIL methods. In the 
context of ABC II, we simply require a good sum¬ 
marisation of the data, that is, a low-dimensional 
summary statistic that hopefully carries most of the 
information in the observed data. Therefore, we feel 
that it is useful if the auxiliary model in this con¬ 
text provides a good fit to the data and is parsimo¬ 
nious, so that the essential features of the data are 
described well and as succinctly as possible. This 
is independent of the process for selecting a gen¬ 
erative model. Therefore, the same auxiliary model 
should be used regardless of which generative model 
is fitted to the data. For pdBIL, we require a flexi¬ 
ble auxiliary model that can mimic the behaviour of 


the generative model for different values of 6 within 
the posterior support. Here, it is not necessary for 
the auxiliary model to provide a good fit to the data 
considering the fact that the generative model be¬ 
ing proposed might be mis-specified. The auxiliary 
model chosen for pdBIL may alter depending on the 
generative model being proposed. In our examples, 
the generative model is either known or provides a 
good fit to the data. In such cases, it would not be 
uncommon to choose the same auxiliary model for 
the ABC II and pdBIL methods. 

The conditions required for pdBIL to produce ex¬ 
act results are very strong and finding an auxiliary 
model that is sufficiently flexible so that the auxil¬ 
iary likelihood can mimic the generative likelihood 
could be difficult in practice. In some applications, 
an auxiliary model that is a simplified version of the 
generative model may be specified where the pa¬ 
rameter of each model has the same interpretation. 
For example, the auxiliary model for a continuous 
time Markov jump process may be its corresponding 
linear noise approximation. In such situations, the 
pdBIL method is unlikely to perform well whilst it 
remains possible that such an approximate model 
could produce useful summary statistics for ABC 
even though the auxiliary model would not fit the 
data well. Jiang and Turnbull (2004) show that II 
can work well in the classical framework when the 
auxiliary model is a simplified version of the gen¬ 
erative model. Further research is required in the 
Bayesian setting. 

An additional advantage of the ABC II approach 
over pdBIL is the extra flexibility of being able to 
accommodate additional summary statistics that do 
not involve an auxiliary model, since this method 
belongs in the more general npsBIL class (see Wood 
(2010), for an example where the summary statistic 
is a combination of auxiliary parameter estimates 
and other summary statistics). Jiang and Turnbull 
(2004) and Heggland and Frigessi (2004) consider 
II applications in a classical framework where the 
comparison of observed and simulated data is made 
on the basis of both an auxiliary model and supple¬ 
mentary summary statistics. 

7. EXAMPLES 
7.1 Toy Example 

In this example, we consider a simple model so 
that exact Bayesian inferences are trivially obtained. 
Our intention here is to investigate the theoretical 
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considerations in Section 4. In particular, we show 
that when the auxiliary model is reasonable, pdBIL 
produces better approximations as the size of sim¬ 
ulated datasets goes beyond that of the observed 
data and as a useful by-product increases the ac¬ 
ceptance probability of the MCMC moves. We also 
demonstrate empirically that unfortunately ABC 
approaches (including those using II to obtain sum¬ 
mary statistics) do not possess this same desirable 
property as n is increased. Additionally, we investi¬ 
gate the output of pdBIL when the auxiliary model 
is poorly chosen. 

Here, the data are N = 100 independent draws 
from a Poisson distribution with a mean of A = 
30, y = (yi,... ,yioo) ‘~ d ' Po(30). The prior is A ~ 
T(a,/3) (where a = 30 and /3 = 1), which results in 
a A|y ~ T(30 + X^=i Vu 101) posterior. For such a 
relatively large value for the mean of the Poisson 
distribution, a normal approximation with mean, //, 
and variance, r, will be reasonable. We use this nor¬ 
mal distribution as the auxiliary model. Here, the 
auxiliary likelihood, MLE and score are trivial to 
compute. The Anderson-Darling test for normality 
produced a p-value of about 0.576, which indicates 
no evidence against the assumption that the normal 
auxiliary model provides a good description of the 
data. 

The summary statistic based on this auxiliary 
model includes the sample mean, y, which is a suf¬ 
ficient statistic for the generative model. Thus, the 
ABC II approaches can be expected to produce es¬ 
sentially exact inferences (excluding Monte Carlo er¬ 
ror) as long as the ABC tolerance is low enough. 
As demonstrated in Figure 2, this is the case. Such 
sufficiency is not usually achieved in practice. How¬ 
ever, it can be seen that the ABC posterior is grossly 
over-precise when the size of the simulated datasets 
is increased to 1000 (i.e., n = 10). 

In the limit as n —>■ oo, the pdBIL method boils 
down to a N( A, A) distribution approximating a 
Po(A) distribution. The central limit theorem states 
that the normal approximation improves as A in¬ 
creases. Since A = 30, pdBIL can never target the 
true posterior. The pdBIL target distribution (for 
n —>• oo) is proportional to A 0 ^^ 2-1 exp(—(/3 + 
N/2)\) exp(—(2A) _1 J2iLi Vi) while the true poste¬ 
rior is proportional to A“ + ^ i=1?/i ” 1 exp (—(/3 + N)\). 
Figure 2(d) demonstrates a small amount of bias 
for the pdBIL method (this is an illustration of Re¬ 
mark 4). 


Figure 2(d) presents the results for the pdBIL 
method based on simulated dataset sizes of n = 1 
and n = 10 (results for n = 100 and n = 1000 are 
even closer to the true posterior but are not shown 
on the figure). It is evident from the figure that 
a more precise posterior is achieved when using 
larger simulated datasets, without necessarily over¬ 
shooting the true posterior. Additionally, there was 
an increase in the MCMC acceptance rate as n in¬ 
creased. For the n values investigated here, the ac¬ 
ceptance rates were roughly 46%, 67%, 72% and 73% 
for increasing n. These acceptance rates are very 
high, especially relative to ABC algorithms which 
generally suffer from quite low acceptance probabil¬ 
ities. This example demonstrates that better infer¬ 
ences using pdBIL can be obtained by increasing 
the size of the simulated dataset beyond that of the 
observed. Unfortunately, ABC inferences that use a 
simulated data size larger than that of the observed 
data cannot be trusted in the same way (see Re¬ 
mark 2). 

The reason for improved inferences from pdBIL 
as n is increased is apparent from Figure 3. Here, it 
can be seen from increasing n the log-likelihoods of 
the generative and auxiliary models are becoming 
more correlated with the slope of the relationship 
becoming approximately one. 

Figure 4(a) shows the true Po(A) and auxiliary 
A r (A,A) log-likelihood values for A within the 99% 
highest prior density region. The vertical lines in¬ 
dicate the bounds of a 99% credible interval based 
on the true posterior. It can be seen that the aux¬ 
iliary log-likelihood is a poor approximation to the 
true log-likelihood in regions with negligible poste¬ 
rior support. This is even the case for larger A values 
where it would be expected that a normal approx¬ 
imation would be more appropriate. However, the 
normal approximation will perform relatively poorly 
in the tails of the distribution. It is evident that the 
auxiliary likelihood acts as a useful replacement like¬ 
lihood in the region of high posterior support [see 
Figure 4(b)], and this is enough to result in a good 
approximation of the true posterior for large n. 

Finally, we investigate the output from pdBIL 
when the auxiliary model is chosen poorly. Fig¬ 
ure 4(c) shows the results for when the auxil¬ 
iary model is N(/i,tq), where tq is fixed. Here, 
the pdBIL posterior as n —>• oo is proportional to 
A" -1 exp(—(/3 - Tq 1 Vi)^) exp(—0.5AIr 0 ” 1 A 2 ). 
Here, we consider tq = 49 (over-dispersed) and tq = 
16 (under-dispersed). The results show over-precise 
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28 30 32 28 30 32 


(a) ABC IP 


(b) ABC IS 




28 30 32 28 30 32 

(c) ABC IL (d) pdBIL 


Fig. 2. Posterior distributions for X, the parameter of the Poisson example for when (a) ABC IP, (b) ABC IS, (c) ABC IL 
and (d) pdBIL is applied (on-line figure in colour). 


and conservative results in the under-dispersed 
and over-dispersed case, respectively. The under¬ 
dispersed and over-dispersed auxiliary models have 
thinner and fatter tails, respectively, than the like¬ 
lihood of the generative model in the parameter 
space well supported by the posterior distribution 
(see Figure 4). In these cases, the auxiliary model is 
not providing a useful replacement likelihood. Just 
by chance ABC II is still exact here as e —> 0 since 
the parameter estimate for /r is a sufficient statistic 
for A. 

7.2 g-and-k Example 

7.2.1 Models and data Quantile distributions (or 
functions) represent a class of distributions that are 
defined in terms of their quantile function. Such 


functions can be formulated to create more flexi¬ 
ble distributions than other standard distributions. 
In this example, the focus is on the g-a,nd-k dis¬ 
tribution described in, for example, Rayner and 
MacGillivray (2002) (the reader is also referred 
to the references therein). This quantile function, 
which can also be interpreted as a transformation of 
a standard normal random variate, has the following 
form: 


Q{z(p)]0) 

( 2 ) 


, 1-exp (~gz{p)) \ 
v 1 + exp (-gz(p))J 

■ (1 + z{p) 2 ) k z(p). 


Here, p denotes the quantile of interest while z(p) 
represents the quantile function of the standard 
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(a) n = 1 


(b) n = 10 




(c) n = 100 


(d) n = 1000 


Fig. 3. Comparison of the generative and auxiliary log-likelihoods for the toy example calculated during the MCMC pdBIL 
algorithm with different values of n /(a) n = 1, (b) n = 10, (c) n = 100, (d) n = 1000y. 


normal distribution. The model parameter is 6 = 
(a,b,c,g,k), though common practice is to fix c at 
0.8, which we do here (see Rayner and MacGillivray 
(2002), for a justification). The likelihood function 
can be computed numerically, although this is more 
expensive than model simulation which is cheaply 
implemented for quantile distributions via the inver¬ 
sion method. Full likelihood-based inference is more 
expensive than the simulation-based approaches for 
the relatively large dataset considered here. 

The observed dataset consists of 10,000 indepen¬ 
dent draws from the g-and-k distribution with a = 3, 
b= 1, c = 0.8, g = 2 and k = 0.5 (same as considered 
in Drovandi and Pettitt (2011)). A nonparanretric 


estimate of the probability density function based 
on these samples is shown in Figure 5. The data 
exhibit significant skewness and kurtosis. 

We use a three component normal mixture model 
with 8 parameters as the auxiliary model. A mixture 
model is a suitable choice for an auxiliary distribu¬ 
tion since it can be made arbitrarily flexible whilst 
maintaining a tractable likelihood function. There¬ 
fore, auxiliary MLEs are computationally easy to 
obtain [here we use the Expectation-Maximisation 
(EM) algorithm] and the subsequent likelihood can 
be evaluated cheaply. On the other hand, mixture 
models can he highly irregular and the MLE is not 
consistent in general. The invariance of the likeli- 
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Fig. 4. Investigating different auxiliary models for the toy example, (a) Comparison of the true and auxiliary log-likelihoods 
values for A values within the 99% highest prior density region, (b) Comparison of true and auxiliary log-likelihoods for different 
choices of the auxiliary model. The vertical lines in (a) and (b) indicate the bounds of a 99% credible interval based on the 
true posterior, (c) Comparison of the posterior distributions for the three different auxiliary models (on-line figure in colour). 


hood to a re-labelling of the components causes an 
immediate issue for ABC IP, which requires a unique 
auxiliary parameter estimate. In an attempt to over¬ 
come this, we post-process the mixture model pa¬ 
rameter estimates generated throughout the ABC 
IP algorithm by ordering them based on the com¬ 
ponent means. Since pdBIL and ABC IL use the 
likelihood of the auxiliary model, they more natu¬ 
rally overcome the label switching issue. However, 
the mixture model can give other numerical issues 
such as those resulting from infinite likelihoods. This 
would create serious issues for methods that use the 
auxiliary likelihood (the auxiliary likelihood would 


not be unique). From investigations on the dataset 
here, it appears that the likelihood is well behaved 
and that the modes in the likelihood correspond only 
to re-labelling of components. Therefore, we proceed 
with ABC IL and pdBIL with caution. The ABC IS 
method, based on the score vector, appeared to not 
have any difficulties accommodating the auxiliary 
mixture model. 

From Figures 5(a) and 5(b), it can be seen that 
there is a correspondence between both the den¬ 
sities and the cumulative distribution functions of 
the mixture model and the data. However, we per¬ 
formed a hypothesis test to assess the goodness- 
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(a) (b) 


Fig. 5. (a) Nonparametric estimate of the density function based on a dataset simulated from the g-and-k distribution together 

with the density of a three component mixture of normals estimated from the data, (b) Absolute value of the difference between 
the theoretical c.d.f. of the three component mixture model and the empirical c.d.f. of the data (on-line figure in colour). 


of-fit of the three component mixture model with 
a parameter given by the MLE. The test-statistic 
was the Kolmogorov-Smirnov statistic that com¬ 
putes the maximum absolute difference between the 
theoretical and empirical c.d.f.s. To avoid any dis¬ 
tributional assumption about this test-statistic, we 
simulated 10,000 values of this statistic under the 
assumption that the mixture model is correct. We 
found that the observed test-statistic was exceeded 
0.25% of the time, indicating strong evidence against 
the mixture providing a good fit to the data. Fig¬ 
ure 5(b) shows the differences between the empirical 
and theoretical c.d.f.’s. However, from Figure 5(a) it 
is evident that the mixture model can explain sev¬ 
eral features of the true model, and since the dataset 
size is large there is a high probability of detecting a 
difference. Our results below show that we are able 
to obtain quite accurate posterior distributions with 
the pBII methods despite the lack of fit suggested 
by the hypothesis test. In Appendix B of the sup¬ 
plemental article (Drovandi, Pettitt and Lee, 2014), 
we present results from using a four component mix¬ 
ture model. Unfortunately we found this was sub¬ 
stantially more expensive to apply and resulted in 
some numerical problems. 

7.2.2 Results The proposal distribution in the 
MCMC for the pdBIL algorithm was guided using 
the results in Drovandi and Pettitt (2011), who anal¬ 
ysed the same data via a traditional ABC approach 


that used robust measures of location, scale, skew¬ 
ness and kurtosis as the summary statistics. 

pdBIL was run using n values of 1, 2, 4, 10, 20 
and 50 for a number of iterations given by 1 million, 
500,000, 500,000, 200,000, 100,000 and 75,000, re¬ 
spectively. The MCMC acceptance probabilities ob¬ 
tained were about 2.8%, 5%, 7.1%, 13.1%, 18.5% 
and 20.8%, respectively. The average effective sam¬ 
ple size (ESS, averaged over the four parameters) di¬ 
vided by the computing time (in hours) were roughly 
63, 127, 106, 124, 70 and 41, respectively. This 
demonstrates how pdBIL is still feasible as n in¬ 
creases to a certain point. However, for very large n 
the computation becomes unmanageable. 

Figure 6 shows the results for n = 1, n = 10 and 
n = 50 (the results for n = 20 and n = 50 were quite 
similar). A very time consuming exact MCMC al¬ 
gorithm was run for 10,000 iterations to obtain a 
gold-standard (producing an average ESS per hour 
of 6). The results show an increase in precision of 
the pdBIL posteriors as n increases. The results for 
a and b are very accurate, while the pdBIL posteri¬ 
ors for g and k show some bias (also with a loss of 
precision for g ). 

ABC IP and ABC IL were run for 1 million it¬ 
erations with a tolerance tuned to achieve an ac¬ 
ceptance rate of about 1%. Due to the ABC IS 
method being so much faster than the other pBII 
approaches, we aimed for a relatively lower ABC 
tolerance and ran the algorithm for more iterations. 
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Fig. 6. Posterior distributions for the parameters of the g-and-k model based on the pdBIL approach with n = 1 (dash), 
n = 10 (dot-dash) and n = 50 (dot). Also shown are results based on using the true likelihood (solid) (on-line figure in colour). 


More specifically, 7 million iterations were used and 
the ABC tolerance chosen resulted in an acceptance 
rate of 0.3%. We also applied a regression adjust¬ 
ment to the (appropriately thinned) ABC II samples 
using the approach of Beaumont, Zhang and Balding 
(2002) in an attempt to eliminate the effect of the 
ABC tolerance. In order to apply regression adjust¬ 
ment for ABC IL, the same post-processing proce¬ 
dure used for ABC IP was required. Without regres¬ 
sion adjustment [see Figure 2 of the supplemental 
article (Drovandi, Pettitt and Lee, 2014)], the ABC 
IS method gave slightly better results than other 
ABC II methods, which could be due to the ability 
of ABC IS getting to a lower ABC tolerance. The 
unadjusted ABC IL results were also slightly bet¬ 
ter than the unadjusted ABC IP results. ABC IS 


produced an average ESS per hour of 90 while the 
corresponding number was 50 and 30 for ABC IP 
and ABC IL, respectively, showing that the ABC 
IS method required less time to produce a better 
approximation. Regression adjustment offered im¬ 
provement to all the ABC II methods. We com¬ 
pared the pBII approaches with the ABC results 
of Drovandi and Pettitt (2011). It should be noted 
that we applied a local regression adjustment to the 
ABC results here as we found some improvement for 
the parameters a and g (results were very similar for 
b and k relative to those obtained in Drovandi and 
Pettitt (2011)). The results are shown in Figure 7. 
Overall, the pBII results present a marked improve¬ 
ment over the ABC analysis of Drovandi and Pettitt 
(2011), with g seemingly the most difficult parame- 
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(c) 9 


(d) k 


Fig. 7. Posterior distributions for the parameters of the g-and-k model based on the ABC approach of Drovandi and Pettitt 
(2011) (dash), ABC IS (star), ABC IL (dot-dash), ABC IP (circle) and pdBIL with n — 50 (dot) and results from using the 
true likelihood (solid). Note that regression adjustment has been applied to all ABC results (on-line figure in colour). 


ter to estimate. The ABC II methods with regression 
adjustment produced very similar results. Taking 
into account accuracy and computational efficiency, 
ABC IS with regression adjustment is probably the 
preferred method. When a four component auxiliary 
model was used [Appendix B of the supplemental ar¬ 
ticle (Drovandi, Pettitt and Lee, 2014)], the ABC II 
methods with regression adjustment produced sim¬ 
ilar results and outperformed pdBIL in terms of ac¬ 
curacy. Further, the ABC IS approach was able to 
avoid the heavy computation associated with fitting 
the four component mixture model at every itera¬ 
tion, and thus also avoided other numerical issues 
such as the EM algorithm converging to potentially 
local optima. The ABC II regression adjustment re¬ 


sults showed some improvement for g and k when 
going from the three to four component mixture 
model. 

7.3 Macroparasite Example 

7.3.1 Models and data Drovandi, Pettitt and 
Faddy (2011) developed an ABC IP approach to 
estimate the parameters of a stochastic model 
of macroparasite population evolution developed 
by Riley, Donnelly and Ferguson (2003) (see also 
Michael et al. (1998)). Data was collected indepen¬ 
dently on 212 cats, who were initially injected with 
a certain number of juvenile Brugia pahangi para¬ 
sites. Some time after each cat was sacrificed, the 
number of parasites that had reached maturity were 
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counted and recorded (see Denham et al. (1972)). 
Drovandi, Pettitt and Faddy (2011) discovered that 
a beta-binomial model provided a good description 
of the data. Below provides a brief review of the 
generative and auxiliary models, with the reader re¬ 
ferred to Drovandi, Pettitt and Faddy (2011) for 
more details. 

At time t, any host is described by three random 
variables {M(t),L(t),I(t)}, where M(t) is the num¬ 
ber of mature parasites, L{t) is the number of larvae 
and I{t) is a discrete version of the host’s immunity. 

It is assumed that each larva matures at a con¬ 
stant rate of 7 per day. Larvae die at a rate 
Pl + &I{t) per larva where pl represents the rate 
at which natural death of larvae occurs and f3 is 
a rate parameter that describes additional death 
of larvae due to the immune response of the host. 
The acquisition of immunity is assumed to be de¬ 
pendent only on the number of larvae and oc¬ 
curs at rate vL{t), and a host loses immunity 
at a rate pi per unit of immunity. Mature para¬ 
sites die at a rate of gM adults per day. Parame¬ 
ters 7 and \im have been previously estimated at 
0.04 (Suswillo, Denham and McGreevy, 1982) and 
0.0015 (Michael et al., 1998), respectively. 

The data were modelled via a continuous time 
discrete trivariate Markov process. Given current 
values of the states at time t, M(t) = i, L(t) = j, 
M(t) = k, and a small time increment At the tran¬ 
sition probabilities at time t + Aj are given by 

p(i + l,j-l,k) = 'yjA t + o(A t ), 

p(i,j - 1 ,k) = (p L +/3k)jA t + o(At), 

(3) p(i - 1, j,k) = UMi^t + o(At), 

p(i,j,k + l) = vjA t + o(Af), 

p(i,j,k- 1) = mkA t + o(A t ), 

and the probability of remaining in the same state is 
one minus the sum of the above probabilities. Only 
the final mature count is observed whilst the immu¬ 
nity and larvae counts are unobserved throughout 
the process. Moreover, the immune response vari¬ 
able I(t) is unbounded. Data generative likelihood- 
based approaches appear infeasible due to compu¬ 
tational issues (see Drovandi, Pettitt and Faddy 
(2011)). Simulation is straightforward via the al¬ 
gorithm of Gillespie (1977). The prior distributions 
are: v ~ 17(0,1), pj ~ 17(0,2), pl ~ 17(0,1) and f3 ~ 
U( 0 , 2 ). 


Here, we denote the observed data as y = (mi,..., 
77 x 212 ) where m, is the mature count for the zth host. 
Covariates for the zth host are given by lj (initial 
larvae count) and t t (sacrifice time). 

For the auxiliary model, Drovandi, Pettitt and 
Faddy (2011) capture the over dispersion via a beta- 
binomial regression model and take into account the 
effect that L and lj have on m;. Denote a, and /Jj 
as the beta-binomial parameters for the ith host. 
More specifically, the zth observation has the fol¬ 
lowing probability distribution: 


(4) p(mi\ai,Pi) 


k \ B(mj + ajjj- mj + fa) 

m) B{pn,Pi) 


where denotes the beta function. Consider a 

re-parameterisation in terms of a proportion, pi = 
cni/(ai + fa), and over-dispersion, & = l/(a* + /%), 
parameter. The auxiliary model relates these param¬ 
eters to the covariates via 


logit (pi) = f p (ti,k ), 
log(&) = fz(U,li), 


where 


fdtiJi) = Mi) 


and 


7100, if k < 100 

7200, if k > 100 


fp{ti,k ) = f P {ti) 

= /3o + /3i(log(ti)-iog( i )) 

+ /?2 (log(ti) — log(t)) 2 . 


Hence, the auxiliary model has the parameter <f> = 
(/Jo, /3i, /?2,7100,7200) while the generative model has 
the parameter 0 = (17 pj, pi, j3). 

Using the approach outlined in Appendix C of the 
supplemental article (Drovandi, Pettitt and Lee, 
2014), we obtained goodness-of-fit p- values of 0.37 
and 0.47, indicating no evidence against the beta- 
binomial model providing a good description of the 
data. Drovandi, Pettitt and Faddy (2011) use the 
AIC to select this auxiliary model over competing 
auxiliary models. 


7.3.2 Results for simulated data For validation of 
the pBII methods for this example, data was sim¬ 
ulated using the same experimental design as the 
observed data based on the parameter configuration 
estimated by Riley, Donnelly and Ferguson (2003); 
v = 0.00084, p! = 0.31, p L = 0.0011 and 0 = 1.1. We 
found that the pBII methods were able to recover 
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Fig. 8. Posterior distributions for the parameters [ (a) v, (b) pi, (c) pl, (d) j3] of the macroparasite model based on a pdBIL 
approach with n= 1 (solid), n = 20 (dot-dash) and n = 50 (dash) (on-line figure in colour). 


the parameters v and pp well, pi was determined 
less precisely and /3 was not recovered. The data are 
not particularly informative about pj and (3 (see 
Drovandi, Pettitt and Faddy (2011), for more dis¬ 
cussion). The ABC IS gave the most precise pos¬ 
terior distributions for v and pp out of the pBII 
methods. For full details on the analysis of this sim¬ 
ulated data, see Appendix C of the supplemental 
article (Drovandi, Pettitt and Lee, 2014). 

7.3.3 Results for real data Here, we used the ABC 
IP results of Drovandi and Pettitt (2011) to form an 
MCMC proposal distribution. The pdBIL method 
with n = 1, n = 20 and n = 50 was run for 1 mil¬ 
lion, 100,000 and 50,000 iterations, respectively. Ac¬ 
ceptance probabilities of roughly 1.4%, 23.5% and 


28.2%, respectively, were obtained. The average ESS 
per hour was 37, 79 and 58, respectively. The sub¬ 
stantial increase in acceptance probability allowed 
us to use fewer iterations. The results are shown in 
Figure 8. The figures suggest that we are not able 
to gain any additional information from the data 
for the parameter v from the pdBIL approach by 
increasing n. However, an increase in precision is 
obtained for pp as 77 is increased. The posteriors are 
shifted slightly for the other two parameters, how¬ 
ever, they are still largely uninformative, although 
the posterior for pj for large n may indicate some 
preference for smaller values of pg. 

We now compare the results of pdBIL with ABC. 
ABC IP and ABC IL MCMC algorithms were all run 
for 1 million iterations. The ABC IP and ABC IL 
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(C) HL 


(d )P 


Fig. 9. Posterior distributions for the parameters [ (a) v, (b) pi, (c) pl, (d) /3] of the macroparasite model based on ABC 
IP (dash), ABC IS (dot), ABC IL (dot-dash) and pdBIL (solid) (on-line figure in colour). 


tolerances were chosen so that the acceptance rate 
was about 1.5%. Due to the increased computational 
efficiency of ABC IS, we ran this algorithm for 20 
million iterations and tuned the tolerance to obtain 
an acceptance rate of about 0.1%. ABC IP and ABC 
IL used about 15 hours of computing time while 
ABC IS only required 11 hours even though 20 times 
more iterations were run. 

The estimated posterior densities (after appropri¬ 
ate thinning) for the different approaches are pre¬ 
sented in Figure 9. In general, the data are not in¬ 
formative about the pi and (5 parameters, so we turn 
our focus to the parameters v and pp. We note that 
it is difficult to compare the approximations without 
having available a gold standard. It can be seen that 
pdBIL produces the most precise inferences for the 


parameters v and pp. Despite being able to reduce 
the ABC tolerance, the ABC IS method appears to 
be the least precise. This is in contrast to the results 
for the simulated data in Appendix C of the sup¬ 
plemental article (Drovandi, Pettitt and Lee, 2014), 
where ABC IS produced the most precise results. 

Regression adjustment was also applied to the 
ABC II methods in an attempt to reduce the ef¬ 
fect of the ABC tolerance. These adjustments were 
applied individually to — log(i') and y/Jtp [see Ap¬ 
pendix C of the supplemental article (Drovandi, Pet¬ 
titt and Lee, 2014)] and the results are shown in 
Figure 10. The regression adjustment does increase 
the precision of the ABC II posteriors. The regres¬ 
sion adjustment appears to shift the modes of the 
ABC II results slightly for v. For pp. the regression 
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Fig. 10. Posterior distributions for the parameters [ (a) v and (b) p,L] of the macroparasite model based on ABC IP (dash), 
ABC IS (dot), ABC IL (dot-dash) and pdBIL (solid). Regression adjustment has been performed on the output of the ABC II 
approaches (on-line figure in colour). 


adjustment brings the ABC II results closer to that 
obtained by pdBIL for n = 50. 

8. DISCUSSION 

This paper has provided an extensive comparison 
of pBII methods, from theoretical, practical and em¬ 
pirical perspectives. We discovered that the pdBIL 
method of Gallant and McCulloch (2009) is funda¬ 
mentally different to ABC II approaches developed 
in the literature. More specifically, we showed that 
pdBIL can produce better approximations by in¬ 
creasing the size of the simulated datasets as long 
as the auxiliary model provides a useful replacement 
likelihood for the generative likelihood for a variety 
of 6 values. In contrast, ABC methods (including 
those that use II to form the summary statistic) 
should simulate datasets the same size as the ob¬ 
served. The pdBIL method has the additional ad¬ 
vantage of not having to determine an appropriate 
ABC tolerance. Furthermore, we found that increas¬ 
ing the size of the simulated dataset beyond that of 
the observed does not necessarily make computa¬ 
tion infeasible due to the increase in statistical ef¬ 
ficiency. However, it is of interest to determine the 
size of the simulated dataset upon which negligible 
improvement will be obtained. This requires further 
research. 

We have also established that BIL is a rather 
flexible framework since the synthetic likelihood ap¬ 
proach of Wood (2010) is a pBIL method that ap¬ 


plies a parametric auxiliary likelihood to the sum¬ 
mary statistic likelihood while ABC can be recov¬ 
ered by selecting a specific nonparametric auxiliary 
model. Our focus in this paper has been on the pBIL 
method where a parametric auxiliary model is pro¬ 
posed for the full data likelihood. However, the ideas 
in this paper may carry over to when the auxiliary 
model is applied to a summary statistic likelihood 
as in Wood (2010). 

For the pBIL method to have some chance of a 
good approximation to the true posterior for the 
specified generative model, it is important that the 
auxiliary model is able to well fit data simulated 
from the generative model for parameter values 
within nonnegligible posterior regions, at least in 
the majority of simulations. It would be possible to 
perform a goodness-of-ht test on the auxiliary model 
for every dataset generated from the proposed model 
during the MCMC pBIL algorithm in order to assess 
the usefulness of the auxiliary model in the context 
of the pBIL method. This is the subject of further 
research. 

In this paper, we have not addressed the issue of 
which ABC II method provides the best approxi¬ 
mation. ABC IS is much faster (when the auxiliary 
score vector is analytic) and requires only weak as¬ 
sumptions, but did not always outperform the other 
ABC II methods in the examples considered in this 
paper. The ABC IP and ABC IL methods differ only 
in their discrepancy function and it is not clear if one 
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discrepancy function dominates the other across ap¬ 
plications. Furthermore, it remains unknown if the 
auxiliary parameter estimate or auxiliary score car¬ 
ries the most information in the observed data. It 
could be that the optimal choice of ABC II ap¬ 
proach is problem dependent. Until further research 
is conducted, we suggest trying all three methods 
(assuming that ABC IP and ABC IL are computa¬ 
tionally feasible). One approach to speed up ABC 
IP and ABC IL might be to start with a compu¬ 
tationally simple but consistent estimator (e.g. the 
method of moments) and apply one iteration of a 
Newton-Raphson method to produce an asymptot¬ 
ically efficient estimator (Cox and Hinkley (1979), 
page 308) in a timely manner. 

From a practical perspective, these methods have 
led to improved approximate analyses for two sub¬ 
stantive problems compared with that obtained in 
Drovandi, Pettitt and Faddy (2011) and Drovandi 
and Pettitt (2011). Across applications considered 
in this paper, ABC IS was the most computationally 
efficient and led to good posterior approximations. 

Overall, pdBIL avoids having to choose an ABC 
discrepancy function and the ABC tolerance. If an 
auxiliary model can be proposed that satisfies a 
rather strong condition, more precise inferences can 
be obtained by taking n large, which we showed is 
still computationally feasible with MCMC pdBIL up 
to a point. However, ABC II appears to provide a 
more general framework for pBII problems, due to 
the extra flexibility of being able to incorporate ad¬ 
ditional summary statistics outside the set formed 
from the auxiliary model and potentially providing 
better approximations when the auxiliary model is a 
simplified version of the generative model. It is this 
extra flexibility that may see ABC II as the method 
of choice as ever-increasingly complex applications 
are encountered. 
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SUPPLEMENTARY MATERIAL 

Supplement to “Bayesian Indirect Inference Us¬ 
ing a Parametric Auxiliary Model” 

(DOI: 10.1214/14-STS498SUPP; .pdf). This mate¬ 
rial contains a simple example to supplement Sec¬ 
tion 3.1 and additional information and results to 
supplement the examples in Sections 7.2 and 7.3. 
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